1 Installation

First, install the requirements:

pip install -r requirements.txt

This is included in the zip file, however they are listed here for transparency:

plotly
tqdm
numpy
matplotlib
scipy
astropy
requests-html
astroquery
beautifulsoup4

1.1 Project Layout (for now)

Next, start working relative to the jplspec (for example if you were working on a project called myproject.py):

.
├── jplspec
│   ├── __init__.py
│   ├── molecules.py
│   ├── setcover.py
│   ├── spectraldata.py
│   └── utility.py
├── project.py
├── README.md
└── requirements.txt

Now we can start running the code!

2 Loading the data:

First, we import the library:

import jplspec as jpl

2.1 The SpectralFile and SpectralData classes

Now we can reference the file with the SpectralFile class, and then read it into a SpectralData object:

path ="Titan/Titan/Win0.clean1.contsub_Jy.rest.scom.c.txt"
myfile = jpl.SpectralFile(path)
print(myfile)
#> SpectralFile(file_location='Titan/Titan/Win0.clean1.contsub_Jy.rest.scom.c.txt', name=None, time=None, date=None, temp=None, units='GHz')
data = jpl.read(myfile)
print(data)
#> SpectralData(file_location='Titan/Titan/Win0.clean1.contsub_Jy.rest.scom.c.txt', name=None, time=None, date=None, temp=None, units='GHz', data=array([[ 2.2147966e+02,  2.7446039e-03],
#>        [ 2.2148015e+02, -1.5861880e-02],
#>        [ 2.2148064e+02, -1.2065790e-02],
#>        ...,
#>        [ 2.2335333e+02, -1.0672529e-03],
#>        [ 2.2335382e+02, -1.3607520e-03],
#>        [ 2.2335431e+02, -2.9259811e-03]], dtype=float32), frequency=array([221.47966, 221.48015, 221.48064, ..., 223.35333, 223.35382,
#>        223.35431], dtype=float32), intensity=array([ 0.0027446 , -0.01586188, -0.01206579, ..., -0.00106725,
#>        -0.00136075, -0.00292598], dtype=float32), num_entries=3840)

Note we can wrap this all in one step with:

data = jpl.read(jpl.SpectralFile(path))

Please also note optional keyword arguments: name, temperature, time, data, units. These are not necessary, however they may be useful from an organizational perspective. All attributes of all objects in the jplspec library can be accessed via object.attribute

3 Spike Identification and Database Querying

Now we can identify the spikes!

3.1 Spike identification

Identified using a three sigma threshold. This can be modified by writing a custom function that returns indices of true cases using np.where, and using the spike_method argument

spikes = jpl.identify_spikes(data)
jpl.plot_spikes(spikes)

print(spikes)
#> Spike(data=array([[ 2.2147966e+02,  2.7446039e-03],
#>        [ 2.2148015e+02, -1.5861880e-02],
#>        [ 2.2148064e+02, -1.2065790e-02],
#>        ...,
#>        [ 2.2335333e+02, -1.0672529e-03],
#>        [ 2.2335382e+02, -1.3607520e-03],
#>        [ 2.2335431e+02, -2.9259811e-03]], dtype=float32), frequency=array([221.52263, 221.54118, 221.54167, 221.59442, 221.62518, 221.62567,
#>        221.62616, 221.6286 , 221.6325 , 221.69305, 221.7248 , 221.76385,
#>        221.76434, 221.98653, 221.98897, 221.98946, 222.01389, 222.01437,
#>        222.01486, 222.01633, 222.01682, 222.0173 , 222.0588 , 222.06076,
#>        222.06125, 222.08371, 222.0842 , 222.09494, 222.09543, 222.09592,
#>        222.09836, 222.09885, 222.09933, 222.09982, 222.10031, 222.1008 ,
#>        222.10129, 222.10178, 222.10715, 222.10764, 222.10959, 222.11203,
#>        222.11252, 222.11642, 222.11838, 222.11887, 222.1218 , 222.1257 ,
#>        222.12619, 222.12668, 222.12717, 222.12766, 222.12814, 222.12863,
#>        222.12912, 222.12961, 222.1301 , 222.13058, 222.13107, 222.13156,
#>        222.13205, 222.13351, 222.13547, 222.13596, 222.1384 , 222.13889,
#>        222.13937, 222.13986, 222.14035, 222.14084, 222.14133, 222.14182,
#>        222.1423 , 222.14279, 222.14719, 222.14767, 222.14816, 222.14865,
#>        222.14914, 222.14963, 222.15012, 222.1506 , 222.1511 , 222.15158,
#>        222.15207, 222.15256, 222.15305, 222.15353, 222.15402, 222.15549,
#>        222.15695, 222.15744, 222.15793, 222.15842, 222.1589 , 222.1594 ,
#>        222.15988, 222.16037, 222.16086, 222.16135, 222.16183, 222.16232,
#>        222.16281, 222.1633 , 222.16379, 222.16428, 222.16476, 222.16525,
#>        222.16574, 222.16623, 222.16672, 222.1672 , 222.1677 , 222.16818,
#>        222.16867, 222.16916, 222.16965, 222.17014, 222.17062, 222.17111,
#>        222.1716 , 222.17209, 222.17892, 222.18576, 222.18625, 222.18674,
#>        222.21164, 222.23509, 222.23558, 222.32494, 222.32542, 222.37572,
#>        222.45679, 222.45728, 222.46118, 222.58374, 222.58423, 222.62524,
#>        222.62573, 222.63257, 222.63306, 222.76344, 222.78249, 222.79079,
#>        222.81422, 222.91776, 222.91824, 222.91873, 223.00027, 223.00076,
#>        223.13457, 223.2459 , 223.24638, 223.24687, 223.24736, 223.34161],
#>       dtype=float32), intensity=array([0.03284835, 0.03877318, 0.03288539, 0.03720951, 0.05853187,
#>        0.08491591, 0.06331633, 0.03301192, 0.03139719, 0.0336207 ,
#>        0.03562813, 0.04591975, 0.03429356, 0.03502963, 0.03396192,
#>        0.03818281, 0.03267767, 0.04255673, 0.03437934, 0.04328281,
#>        0.04266202, 0.03136031, 0.03447701, 0.04550486, 0.04848193,
#>        0.04093966, 0.03400404, 0.03151353, 0.04374534, 0.03124158,
#>        0.03469666, 0.05565396, 0.07425053, 0.07161437, 0.06504935,
#>        0.07152496, 0.06822967, 0.04492994, 0.03994365, 0.04085354,
#>        0.03125374, 0.03184366, 0.0321136 , 0.03139608, 0.03370965,
#>        0.03504742, 0.03116627, 0.03192464, 0.04326647, 0.04641563,
#>        0.05211566, 0.07108778, 0.09229068, 0.1230322 , 0.1345236 ,
#>        0.09091274, 0.05299176, 0.05250734, 0.05965727, 0.05811873,
#>        0.04186113, 0.03500009, 0.03322943, 0.0339509 , 0.03623006,
#>        0.04652108, 0.03770222, 0.03305039, 0.03956946, 0.04971356,
#>        0.05607868, 0.0540103 , 0.051415  , 0.03203447, 0.03607384,
#>        0.04520367, 0.04587346, 0.04407803, 0.0533353 , 0.09147322,
#>        0.1242162 , 0.107777  , 0.08164912, 0.06181676, 0.04563434,
#>        0.03779434, 0.03900097, 0.05082107, 0.05046627, 0.03096247,
#>        0.03386702, 0.03802484, 0.0368301 , 0.04243016, 0.04828916,
#>        0.04550148, 0.03783446, 0.04280617, 0.06605002, 0.07371385,
#>        0.07734492, 0.117156  , 0.1346142 , 0.09296115, 0.06307497,
#>        0.0562511 , 0.05211168, 0.05525983, 0.0670226 , 0.086752  ,
#>        0.1234378 , 0.1282079 , 0.08913841, 0.06674053, 0.05793724,
#>        0.04502389, 0.04376596, 0.04502972, 0.04501087, 0.05224849,
#>        0.06165801, 0.05180327, 0.03267532, 0.03184385, 0.03972083,
#>        0.03147551, 0.03246397, 0.04341188, 0.03356467, 0.04105018,
#>        0.03546823, 0.03776142, 0.03295065, 0.03648829, 0.03396528,
#>        0.03944029, 0.04025874, 0.03840566, 0.04196307, 0.03797742,
#>        0.03744666, 0.03322563, 0.03087771, 0.04028279, 0.03558374,
#>        0.04901849, 0.06510025, 0.04805615, 0.03715763, 0.04285347,
#>        0.03412285, 0.03438776, 0.0381277 , 0.03895236, 0.03554877,
#>        0.0309805 ], dtype=float32), index=array([  88,  126,  127,  235,  298,  299,  300,  305,  313,  437,  502,
#>         582,  583, 1038, 1043, 1044, 1094, 1095, 1096, 1099, 1100, 1101,
#>        1186, 1190, 1191, 1237, 1238, 1260, 1261, 1262, 1267, 1268, 1269,
#>        1270, 1271, 1272, 1273, 1274, 1285, 1286, 1290, 1295, 1296, 1304,
#>        1308, 1309, 1315, 1323, 1324, 1325, 1326, 1327, 1328, 1329, 1330,
#>        1331, 1332, 1333, 1334, 1335, 1336, 1339, 1343, 1344, 1349, 1350,
#>        1351, 1352, 1353, 1354, 1355, 1356, 1357, 1358, 1367, 1368, 1369,
#>        1370, 1371, 1372, 1373, 1374, 1375, 1376, 1377, 1378, 1379, 1380,
#>        1381, 1384, 1387, 1388, 1389, 1390, 1391, 1392, 1393, 1394, 1395,
#>        1396, 1397, 1398, 1399, 1400, 1401, 1402, 1403, 1404, 1405, 1406,
#>        1407, 1408, 1409, 1410, 1411, 1412, 1413, 1414, 1415, 1416, 1417,
#>        1418, 1432, 1446, 1447, 1448, 1499, 1547, 1548, 1731, 1732, 1835,
#>        2001, 2002, 2010, 2261, 2262, 2346, 2347, 2361, 2362, 2629, 2668,
#>        2685, 2733, 2945, 2946, 2947, 3114, 3115, 3389, 3617, 3618, 3619,
#>        3620, 3813]), spike_method=<function three_sigma_spike at 0x7fbc3deefcb0>, units='GHz')

3.1.1 Adjusting the spike identification method

To adjust the spike identification method, we write a function which returns a boolean value. We have 4 attributes to play with, intensity, frequency, std, and mean. These are all optinally used, so we can write spike identification methods with whatever we want. As an example, if we are looking for two sigma events, we could write:

def two_sigma_spike(x):
  return x.intensity > (x.mean + 2 * x.std)

3.2 Finding associated molecules from the spikes

Next, we query the spectroscopy datasets for our spikes:

molecules = jpl.get_molecules_from_spikes(spikes, save_path="win0.dmp")

Note this will only take a long time once if we use the save_path argument. This saves the molecule information in a standard python dictionary.

4 Set covering

Finally, we can score all the molecules and generate the sets!

import pprint
result = jpl.SetCovering(spikes, molecules)
#> Restructuring data...
#> 
#> 
identifying matches...:   0%|          | 0/67 [00:00<?, ?it/s]
identifying matches...:   1%|1         | 1/67 [00:01<01:53,  1.72s/it]
identifying matches...:   3%|2         | 2/67 [00:04<02:13,  2.06s/it]
identifying matches...:   4%|4         | 3/67 [00:06<02:02,  1.92s/it]
identifying matches...:   6%|5         | 4/67 [00:07<01:44,  1.65s/it]
identifying matches...:   7%|7         | 5/67 [00:09<01:45,  1.71s/it]
identifying matches...:   9%|8         | 6/67 [00:09<01:15,  1.23s/it]
identifying matches...:  10%|#         | 7/67 [00:09<01:02,  1.04s/it]
identifying matches...:  12%|#1        | 8/67 [00:10<00:58,  1.01it/s]
identifying matches...:  13%|#3        | 9/67 [00:11<00:48,  1.19it/s]
identifying matches...:  15%|#4        | 10/67 [00:12<00:56,  1.01it/s]
identifying matches...:  16%|#6        | 11/67 [00:15<01:31,  1.63s/it]
identifying matches...:  18%|#7        | 12/67 [00:16<01:23,  1.52s/it]
identifying matches...:  19%|#9        | 13/67 [00:17<01:01,  1.13s/it]
identifying matches...:  22%|##2       | 15/67 [00:26<01:51,  2.14s/it]
identifying matches...:  24%|##3       | 16/67 [00:26<01:26,  1.70s/it]
identifying matches...:  25%|##5       | 17/67 [00:26<01:02,  1.25s/it]
identifying matches...:  27%|##6       | 18/67 [00:27<00:55,  1.13s/it]
identifying matches...:  28%|##8       | 19/67 [00:27<00:39,  1.22it/s]
identifying matches...:  30%|##9       | 20/67 [00:31<01:12,  1.53s/it]
identifying matches...:  31%|###1      | 21/67 [00:33<01:29,  1.96s/it]
identifying matches...:  33%|###2      | 22/67 [00:36<01:40,  2.23s/it]
identifying matches...:  34%|###4      | 23/67 [00:38<01:25,  1.95s/it]
identifying matches...:  36%|###5      | 24/67 [00:40<01:27,  2.04s/it]
identifying matches...:  37%|###7      | 25/67 [00:41<01:07,  1.61s/it]
identifying matches...:  39%|###8      | 26/67 [00:41<00:55,  1.36s/it]
identifying matches...:  40%|####      | 27/67 [00:43<00:56,  1.42s/it]
identifying matches...:  42%|####1     | 28/67 [00:46<01:11,  1.84s/it]
identifying matches...:  43%|####3     | 29/67 [00:46<00:52,  1.39s/it]
identifying matches...:  45%|####4     | 30/67 [00:48<00:59,  1.62s/it]
identifying matches...:  46%|####6     | 31/67 [00:48<00:43,  1.21s/it]
identifying matches...:  48%|####7     | 32/67 [00:53<01:14,  2.12s/it]
identifying matches...:  49%|####9     | 33/67 [00:53<00:51,  1.52s/it]
identifying matches...:  51%|#####     | 34/67 [00:53<00:37,  1.14s/it]
identifying matches...:  52%|#####2    | 35/67 [00:54<00:34,  1.08s/it]
identifying matches...:  54%|#####3    | 36/67 [00:55<00:32,  1.04s/it]
identifying matches...:  55%|#####5    | 37/67 [00:56<00:35,  1.20s/it]
identifying matches...:  57%|#####6    | 38/67 [00:59<00:49,  1.72s/it]
identifying matches...:  58%|#####8    | 39/67 [01:05<01:18,  2.80s/it]
identifying matches...:  60%|#####9    | 40/67 [01:08<01:22,  3.06s/it]
identifying matches...:  61%|######1   | 41/67 [01:09<00:59,  2.29s/it]
identifying matches...:  64%|######4   | 43/67 [01:10<00:42,  1.77s/it]
identifying matches...:  66%|######5   | 44/67 [01:11<00:36,  1.60s/it]
identifying matches...:  67%|######7   | 45/67 [01:26<02:02,  5.56s/it]
identifying matches...:  69%|######8   | 46/67 [01:50<03:52, 11.05s/it]
identifying matches...:  70%|#######   | 47/67 [01:51<02:39,  7.99s/it]
identifying matches...:  72%|#######1  | 48/67 [01:52<01:55,  6.10s/it]
identifying matches...:  73%|#######3  | 49/67 [01:53<01:20,  4.50s/it]
identifying matches...:  75%|#######4  | 50/67 [01:54<01:00,  3.54s/it]
identifying matches...:  76%|#######6  | 51/67 [02:21<02:48, 10.56s/it]
identifying matches...:  78%|#######7  | 52/67 [02:22<01:52,  7.49s/it]
identifying matches...:  79%|#######9  | 53/67 [02:24<01:21,  5.81s/it]
identifying matches...:  81%|########  | 54/67 [02:24<00:54,  4.22s/it]
identifying matches...:  82%|########2 | 55/67 [02:27<00:45,  3.78s/it]
identifying matches...:  84%|########3 | 56/67 [02:37<01:03,  5.73s/it]
identifying matches...:  85%|########5 | 57/67 [02:40<00:49,  5.00s/it]
identifying matches...:  87%|########6 | 58/67 [02:43<00:38,  4.30s/it]
identifying matches...:  88%|########8 | 59/67 [02:44<00:27,  3.41s/it]
identifying matches...:  90%|########9 | 60/67 [02:55<00:39,  5.69s/it]
identifying matches...:  91%|#########1| 61/67 [02:56<00:24,  4.11s/it]
identifying matches...:  93%|#########2| 62/67 [02:59<00:18,  3.72s/it]
identifying matches...:  94%|#########4| 63/67 [02:59<00:11,  2.76s/it]
identifying matches...:  96%|#########5| 64/67 [03:02<00:07,  2.63s/it]
identifying matches...:  97%|#########7| 65/67 [03:02<00:04,  2.01s/it]
identifying matches...:  99%|#########8| 66/67 [03:03<00:01,  1.71s/it]
identifying matches...: 100%|##########| 67/67 [03:05<00:00,  1.79s/it]
identifying matches...: 100%|##########| 67/67 [03:05<00:00,  2.77s/it]
#> 
identifying sets...:   0%|          | 0/156 [00:00<?, ?it/s]
identifying sets...:  15%|#4        | 23/156 [00:00<00:00, 224.62it/s]
identifying sets...:  30%|###       | 47/156 [00:00<00:00, 216.83it/s]
identifying sets...:  37%|###7      | 58/156 [00:00<00:01, 90.37it/s] 
identifying sets...:  46%|####5     | 71/156 [00:00<00:00, 89.27it/s]
identifying sets...:  51%|#####1    | 80/156 [00:01<00:01, 44.30it/s]
identifying sets...:  60%|#####9    | 93/156 [00:01<00:01, 54.97it/s]
identifying sets...:  65%|######5   | 102/156 [00:01<00:01, 49.06it/s]
identifying sets...:  71%|#######   | 110/156 [00:02<00:02, 15.80it/s]
identifying sets...:  74%|#######4  | 116/156 [00:03<00:03, 13.27it/s]
identifying sets...:  78%|#######7  | 121/156 [00:03<00:02, 16.77it/s]
identifying sets...:  85%|########4 | 132/156 [00:03<00:01, 21.97it/s]
identifying sets...:  93%|#########2| 145/156 [00:03<00:00, 29.00it/s]
identifying sets...: 100%|##########| 156/156 [00:03<00:00, 41.34it/s]
scored_molecules = result.likeliest_molecules()
scored_sets = result.likeliest_sets()
pprint.pprint(scored_molecules)
#> [('Cyanobutadiynylide anion', 1.0),
#>  ('Hydroxyl radical', 1.0),
#>  ('Methyl Cyanide', 1.0),
#>  ('Ammonia', 1.0),
#>  ('Propane', 0.9999999999583432),
#>  ('Methyl Acetylene', 0.9161703896648132),
#>  ('Ethyl Cyanide', 0.8099832333355084),
#>  ('Formamide', 0.6109420482708394),
#>  ('Vinylamine', 0.6082647521369084),
#>  ('<i>anti</i>-Ethanol', 0.42540923657029633),
#>  ('Aminoethanol', 0.2857142277663929),
#>  ('Pyrrole', 0.22772082801623222),
#>  ('&alpha;-Alanine', 0.20958550457779118),
#>  ('Vinylcyanoacetylene', 0.17290649126828636),
#>  ('Benzonitrile', 0.15903824640153372),
#>  ('Peroxynitric acid', 0.15803021057632077),
#>  ('n-Butyronitrile, gauche/anti combined', 0.14009449852619932),
#>  ('Glycolaldehyde', 0.1384634450891849),
#>  ('Chlorine nitrate', 0.098571200733809),
#>  ('Propenoic acid', 0.09555601219383479),
#>  ('Hydrogen peroxide', 0.09037408118828617),
#>  ('Ethylene Glycol', 0.08772099801492059),
#>  ('Carbonyl cyanide', 0.07917975457390376),
#>  ("1,2-propanediol, g'G'g", 0.07065886451173851),
#>  ('Cyanoethenone', 0.06561668770664496),
#>  ('Chlorine dioxide', 0.062289741259311845),
#>  ('Bromine Dioxide', 0.05858545985438459),
#>  ('Acetone', 0.05512240556141538),
#>  ('Anisole', 0.05302893229596831),
#>  ('Diketene', 0.04696095467958493),
#>  ('n-Butyl cyanide', 0.04340865406312234),
#>  ('Tetrasulfur', 0.03850043867000284),
#>  ('Methyl Formate', 0.03249443426799928),
#>  ('Phenol', 0.032381647962831904),
#>  ('gauche-n-butyronitrile, v<sub>29</sub> = 1', 0.027664800540490163),
#>  ('gauche-n-butyronitrile, v<sub>30</sub> = 2', 0.02625203366729838),
#>  ("1,3-propanediol, gGG'g", 0.024132415847265012),
#>  ('Aminoacetonitrile', 0.023838424529676546),
#>  ('anti-n-butyronitrile, v<sub>30</sub> = 2', 0.01925216706224276),
#>  ('Phenylacetylene', 0.015298323848510771),
#>  ('Methyl formate, v<sub>t</sub> = 0, 1', 0.015255375826319039),
#>  ('Chlorine peroxide', 0.011370695145544453),
#>  ('Glycine', 0.010325526430301914),
#>  ("1,2-propanediol, aG'g", 0.00969158152531432),
#>  ('Ethyl cyanide', 0.004260061548544715),
#>  ('Hydroxyacetonitrile', 0.004011715870671322),
#>  ('anti-n-butyronitrile, v<sub>29</sub> = 1', 0.003517007235251116),
#>  ('2-Cyanobutane', 0.0033249337049754374),
#>  ('Sulfuric acid', 0.0026797163812313967),
#>  ("1,2-propanediol, gG'a", 0.001150130377220704),
#>  ('<i>cyclo</i>-Propyl cyanide', 0.0011417077711069284),
#>  ('3-Methylbutyronitrile', 0.001038514035906483),
#>  ('Vinyl Isocyanide', 0.0009273909477218936),
#>  ('iso-propyl cyanide', 0.0008605087284565887),
#>  ('Nitric acid', 2.422384279594392e-05),
#>  ('(Z)-Cyanovinylacetylene', 3.1447455510536433e-06),
#>  ('Vinyl Cyanide', 3.966180009588776e-07),
#>  ('Cyanoallene', 1.9872345977895267e-07),
#>  ('Butyronitrile', 1.6043680634860682e-07),
#>  ('Dimethyl ether', 6.497156324498979e-08),
#>  ('Acetaldehyde', 2.4464085516366753e-08),
#>  ('Cyclopropynylidyne', 1.924516228614758e-08),
#>  ('Ethanol, gauche/anti combined', 1.7689270578734603e-08),
#>  ('Hydroxyacetone', 1.5315561323259803e-08),
#>  ('trans-Ethanol', 8.527885620594696e-09),
#>  ('gauche-Ethanol', 8.527885620594696e-09)]
pprint.pprint(scored_sets[0])
#> [{('(Z)-Cyanovinylacetylene', 3.1447455510536433e-06),
#>   ('<i>cyclo</i>-Propyl cyanide', 0.0011417077711069284),
#>   ('Aminoethanol', 0.2857142277663929),
#>   ('Butyronitrile', 1.6043680634860682e-07),
#>   ('Chlorine peroxide', 0.011370695145544453),
#>   ('Dimethyl ether', 6.497156324498979e-08),
#>   ('Hydroxyl radical', 1.0),
#>   ('Methyl Acetylene', 0.9161703896648132),
#>   ('Methyl Cyanide', 1.0),
#>   ('Peroxynitric acid', 0.15803021057632077),
#>   ('Phenol', 0.032381647962831904),
#>   ('Phenylacetylene', 0.015298323848510771),
#>   ('Propane', 0.9999999999583432),
#>   ('Propenoic acid', 0.09555601219383479),
#>   ('Sulfuric acid', 0.0026797163812313967),
#>   ('Tetrasulfur', 0.03850043867000284),
#>   ('Vinyl Cyanide', 3.966180009588776e-07)},
#>  0.2680498315712268]
fig = result.visualize() # lines is a height scaling parameter

We can visualize this with fig.show(), just like matplotlib: